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Pop Quiz: 
Using different views in analysis 


What does this look like? Sy 


A circle with a dot in the center? 
A sphere with a hole through the center? 
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It could be this... 
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Or it could be this... S& 
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A single view can mislead you... 
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® Conclusion vasa 


Probabilistic Risk Assessment (PRA) 
is a tool to help you assess the risk by 
looking at systems and operations ina 

different view both quantitatively and 

qualitatively. 


Given our available budget and time, 
we must be smart and efficient in how 
and what we do. That’s where PRA 
can make a difference. 


Questions? 


youelg sishjeuy WINS OST 
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Introduction 


Probabilistic Risk Assessment (PRA) is one of the tools 
in NASA’s Safety & Mission Assurance (S&MA) toolbox. 
It provides both depth and width in evaluating systems, 
vehicles, vessels, facilities, and missions. 


NASA continues to get budgets with high expectations 
from the public. S&MA must continue to do its job with 
less, thus we have to be smarter and more efficient. 


PRA has been used successfully in several industries, 
such as commercial nuclear power, aerospace, 
transportation, chemical, and medical. 


BSEE has hired NASA’s Johnson Space Center (JSC) to 
use its PRA experience to develop a PRA procedures 
guide for the Oil & Gas industry and to develop several 
example applications. 


id 


Oil & Gas Examples 


° Facility Level Risk Assessment 
— Deepwater Drilling Operation 
— Shallow Water Drilling Operation 
— Subsea Oil Production 
— Rigs and Platforms 


° System Level Risk Assessment 
— Blowout Preventer (BOP) 
— Dynamic Positioning System (DPS) 
— Mud Systems 


° Focused risk trade studies between current and proposed 
process/design. For example: 


— Evaluate the proposed requirement for additional subsea accumulator bottles in 
the Well Control Rule for a five year time frame vs. the existing system in API 
STD-53. 


— Comparing different BOP ram drivers and sealing. 
— Evaluating operational work arounds given an initiating event, such as bolt failure. 
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What is PRA? vasa 


° PRA is a comprehensive, structured, and disciplined approach to 
identifying and analyzing risk in engineered systems and/or processes. 
It attempts to quantify rare event probabilities of failures. It attempts to 
take into account all possible events or influences that could reasonably 
affect the system or process being studied. It is inherently and 
philosophically a Bayesian methodology. In general, PRA is a process 
that seeks answers to three basic questions: 


gclarera| 
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V What kinds of events or scenarios can occur (i.e., what can go 
wrong)? 

V What are the likelihoods and associated uncertainties of the events 
or scenarios? 
What consequences could result from these events or scenarios 
(e.g., Loss of Crew, Loss of Mission, Loss of Hydrocarbon 
Containment, Reactor Core Damage Frequency)? 
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° There are other definitions and questions that it can help answer. 


° The models are developed in “failure space”. This is usually different 
from how designers think (e.g. success space). 


° PRAs are often characterized by (but not limited to) event tree models, fault 
tree models, and simulation models. 
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© When can PRA be Performed? vasa 


NEW DEVELOPMENTS 

The ideal time to conduct a PRA is at the beginning of the design process 
to incorporate the necessary safety and risk avoidance measures 
throughout the development phase at minimal cost. 


gclareral 


EXISTING SYSTEMS 

PRA can be applied to existing systems to identify and prioritize risks 
associated with operations. Risk assessments can evaluate the impact of 
system changes and help avoid compromises in quality or reliability while 
increasing productivity. 
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INCIDENT RESPONSE 

In the event of unexpected downtime or an accident, our team can assess 
the cause of the failure and develop appropriate mitigation plans to 
minimize the probability of comparable events in the future. 
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In a nutshell, PRA can be applied from concept to decommissioning 
during the life cycle, including design and operations. 


May 2016 11 


Some Background vasa 


° In late fifties / early sixties Boeing and Bell Labs developed Fault Trees 
to evaluate launch systems for nuclear weapons and early approaches 
to human reliability analysis began 


° NASA experimented with Fault Trees and some early attempts to do 
Probabilistic Risk Assessment ile’ in sixties (most notably on the 
Apollo oaram) but then abandoned it and reduced quantitative risk 
assessmen 


° Nuclear pore He Re picked up the technology in early seventies and 
created WASH-1400 (Reactor Safety Study) in mid seventies. 


— This is considered the first modern PRA 


— Was shelved until Three Mile Island (TMI) incident happened in 1979. It was 
determined that the WASH-1400 study gave insights to the incident that could not 
be easily gained by any other means. 


° PRA is now practiced by all commercial nuclear plants in the United 
States and a large amount of data, methodology and documentation for 
PRA technology has been developed by the industry and the Nuclear 
Regulatory Commission (NRC) 


— All new Nuclear Plants must license their plants based on PRA as well as “Defense 
In Depth” concepts. 


— The NRC practices its oversight responsibility of the commercial nuclear industry 
using a “Risk” based approach that is heavily dependent on PRA. 


— SAPHIRE (Systems Analysis Programs for Hands-on tedialee Reliability Evaluations) isa PRA 
software tool developed by the Idaho National Lab for the U.S. NRC and also used by NASA. 
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PRA Overview 


PRA Process 


Probabilistic Risk Assessment Flow 
End States 


Examples: 
* Loss of life 
* Loss of facility 


>: Siukiiowmn * Sequences of operation 


i 
2 * Timelines 
* Fire 


[fn 
* Blowout List of consequence * Operational Procedures 


afin * Operational . 
* Leak of interest . Risk Levels for 
* Exceeding . — — Rules/Assumptions Selected End States 


mate * Malfunction Procedures 
™ <> t 
* External event 


assessment t EAI ¥y 
* Training Manuals 
* System Architecture 
: : * Engineering Expertise 
Engineering * P&iDs 
hes A ¢ Human Error 
Analysis i * Common Cause 
used to 
support 
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* Hazard Reports 

* Functional 
Analyses 

* FMEAs 

* Previous risk 
assessments 
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SuCCeSS * Customer Data Relative Risk Drivers 
iter] * Industry Databases 
criteria, 5 
response > ICON 
time, etc. © Well Master Documentation of the PRA 
: athens . supports a successful 


independent review process 
and long-term PRA application 


* Other Assessments 
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The PRA Team Sy 


° APRA system analysis team includes both system domain 
experts and PRA analysts. The key to success is multi-way 
communication between the PRA analysts, domain experts, 
and management. 


° A majority of PRA analysts have engineering degrees with 
operations and/or design backgrounds in order to understand 
how systems work and fail. This is essential in developing the 
failure logic of the vehicle or facility. 


* Good data analysts understand how to take the available data 
to generate probabilities and their associated uncertainty for 
the basic events that the modelers can use or need. 
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°* Building or developing a PRA involves: 
— understanding its purpose and the appropriate modeling techniques, 
— designing how it will serve that purpose, 
— populating it with the desired failure logic and probabilities, and 


— trouble shooting it (nothing works the first time) ‘5 
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PRA Development Process 


PRA Development Process 


Defining the PRA Study Scope and Objectives Initiating Events Identification Event Sequence Diagram (Inductive Logic) 


End State: LOC --4 


End State: OK 


\ 


C End State: LOM 


. =; © _(enasuate:tom ) 
tnd SieiL0e ) 


Event Tree (ET) Modeling Fault Tree (FT) System Modeling Mapping of ET-defined Scenarios to Causal Events 
End Not A 
E B c D E 
a Bs _—_—_—— 
Logic Gate Le Basic Event Q Internal initiating events Gre of these events 
[ Q External initiating events Se 
A) a Q ~~ Hardware failure AND 
| Q Human error \ —_ 
[ Q Software error pe 7/ one ormore 
( Q Common cause failure {Settee ) 
a lementa 
Q Environmental conditions \ woe 
Q ~~ Other 


Big gl 


Link to another fault tree 


Probabilistic: Treatment of BeslelEvents: Model Logic and Data Analysis Review Model Integration and Quantification of Risk Scenarios 


20 so 
6 
2 H : 

a 20 i Domain Experts ensure that system failure logic Integration and quantification of 
30 i 2 is correctly captured in model and appropriate data a dealer eeneninliN 
20 19} i s i and propagation of epistemic 
7 5 19] is used in data analysis (_ End State: LOM uncertainties to obtain 

G01 0,02 0.03 0.04 0.02 0.04 0.06 0.0 0,02 0.04 0.08 0.08 Cae | 


minimal cutsets (risk 
‘scenarios in terms of basic 
events) 

likelihood of risk scenarios 
uncertainty in the 

2 likelihood estimates 


Examples (from left to right): 
Probability that the hardware x fails when needed 

Probability that the crew fail to perform a task 

Probability that there would be a windy condition at the time of landing 


The uncertainty in occurrence frequency of an event 
is characterized by a probability distribution 


Technical Review of Results and Interpretation 


Communicating & Documenting 
Risk Results and Insights to Decision-maker 


Displaying the results in tabular and graphical forms 

Ranking of risk scenarios 

Ranking of individual events (e.g., hardware failure, 

human errors, etc.) 

Q Insights into how various systems interact 

Q Tabulation of all the assumptions 

Q_ Identification of key parameters that greatly influence 
the results 

Q_ Presenting results of sensitivity studies 

Q__s~Proposing candidate mitigation strategies 


ooo 
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® PRA Development Process (2) vasa 


° Defined the scope of the PRA 


— Start with the end in mind or the question you want answered. For 
example, loss of hydrocarbon containment and loss of life failure end 
states 

— Define mission scope 

— Establish the mission/operational phases and layout the mission level 
event trees and corresponding top events to be analyzed 


° Develop logic models 
— Assign top events to system analysts for each subsystem and work with 
domain experts to develop fault trees 
— System analysts work with data analysts and domain experts to 
determine level of detail and failure logic (develop fault trees to the level 
that data exists) 


_ oe appropriate project office concurrence of system models (fault 
trees 
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PRA Development Process (3) vasa 


° Develop failure data into failure probabilities 
— Obtain specific failure history or best available generic data 


— Data analysts calculate failure probabilities based on best available data 
and approved methods 


gclaceial 


° Quantify the model, perform sanity checks, re-iterate until Team 
is in agreement 
— Quantify the integrated model and perform sanity checks to determine 
which simplifying model assumptions need to be re-evaluated, where 
uncertainties need to be narrowed, where additional deterministic 
analyses are needed 


E 


° Shares results with program and projects 
— Risk ranking and risk insights 
— Incorporate feedback into PRA and into program/project design/ops 


— Maintain “Living PRA” to represent new program information (data 
updates) and evolving model scope 
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® Simple Example of a Small PRA model Sy 


The spacecraft is designed with two redundant 


sets of thrusters (independent of each other) 


Each propellant distribution module consists 
a hydrazine tank, filters, distribution lines, 
normally-open isolation valves, sensors, 
heaters, etc. (only components that affect 
mitigation of leaks are shown) 


When thruster operation is needed, the 
controller opens the solenoid valves (not 
shown) to allow hydrazine to flow 


The controller monitors the pressure of feed- 
lines via pressure transducers (P1 and P2). Itis 
designed to differentiate between the normal 
thruster operation and a leak 


In the event of a leak, isolation valves (V1 and 
V2) should both close 

Successful termination of the leak leads to the 
loss of one but not both, thruster sets 

Failure to terminate the leak can cause damage 
to the flight critical avionics and/or damage to 
scientific equipment: 


- Hydrazine acts as a wire stripper and is 
corrosive 


to one set of 
thrusters 


Pressure Transducer Isolation Valve 


Simplified Schematic of Propellant 
Distribution Module 
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Example of Event Sequence Diagram 
ESD 


damage to 
scientific 
equipment 


Hydrazine 
leaks 


Better viewed as 
good things are 
yemeym Cohaal-malelait 
Farol ey-(omaalialeysy 
are down (i.e. 
success is up or 
to the right and 
failure is down) 


damage to 
scientific 
equipment 


yes 


statements are 
made under 
different 
conditions 
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Hydrazine 
leaks 


Hydrazine leaks Leak not detected Leak not isolated camage 1G tight 
critical avionics 


Better viewed as 
(ofeletom taliale[sw-lk= m0] °) 
FValomoy-\omialialelsarelts 
down, i.e. success 
up and failure down 


damage to 
scientific 
equipment 


damage to 
scientific 
equipment 


The ESD Translated Into an Event Tree 


—s 


\— — — Sam — - — 
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Leak not 
detected 


SS) 


LD 


ault Trees Are Attached to the Event Tree 


a 


Controller fails 


Common cause 
failure of P 
transducers 


Pressure 
transducer 1 
fails 


Pressure 
transducer 2 
fails 


) 


JSC S&MA Analysis Branch 


damage to 


Hydrazine leaks Pes Gamage to flight scientific 
detected critical avionics : 
equipment 


End state 


PRA model embodies a collection of 
various models (logic, reliability, 
simulation and physical, etc.) in an 
integrated structure 
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Types of Data that Exist in the Models vasa 


° Functional — A functional failure event is generally defined as failure of a 
component type, such as a valve or pump, to perform its intended function. 
Functional failures are specified by a component type (e.g., motor pump) and 
by a failure mode for the component type (e.g., fails to start). Functional 
failures are generally defined at the major component level such as Line 
Replaceable Unit (LRU) or Shop Replaceable Unit (SRU). Functional failures 
typically fall into two categories, time-based and demand-based. Bayesian 
update as Shuttle specific data becomes available. 


° Phenomenological — Phenomenological events include non-functional events 
that are not solely based on equipment performance but on complex 
interactions between systems and their environment or other external factors 
or events. Phenomenological events can cover a broad range of failure 
scenarios, including leaks of flammable/explosive fluids, engine burn through, 
over pressurization, ascent debris, structural failure, and other similar 
situations. 


° Human - Three types of human errors are generally included in fault trees: 
re-initiating event, initiating event (or human-induced initiators), and post- 
Initiating event interactions. 
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° Common Cause — Common Cause Failures (CCFs) are multiple failures of 
similar components within a system that occur within a specified period of time 
due to a shared cause. 


°* Conditional — A probability that is conditional upon another event, i.e. given 
that an event has already happened what is the probability that successive 
events will fail 
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Common Cause Modeling Sy 


(More details and examples in Back-up Charts) 


° All large PRAs of complex and redundant machines must include 
“common cause” effects to be complete and accurate 


* Common Cause are those conditions that defeat the benefits of 
redundancy 
— Not “single point failures” 
— Similar to “generic cause” 


° There are three recognized ways to perform common cause modeling: 
— The Beta Model 
— The Multiple Greek Letter Model 
— The Alpha Model 


° We use an iterative approach to modeling common cause first the 
Beta Model approach is used and if it shows up as a risk driver a 
Multiple Greek Letter Model is used 
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° Generic data from NUREG/CR-5485 for the majority of the events since 
there are few cases where there is enough Shuttle data to develop 
Shuttle specific values 

— RCS Thrusters and ECO sensors are examples of cases where Shuttle specific 
data is used to calculate the common cause parameters 
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® Unknown and Underappreciated Risks Sy 


° Risk model completeness has long been recognized as a 
challenge for simulated methods of risk analysis such as PRA as 
traditionally practiced. 


Telaleal 


° These methods are generally effective at identifying system 
failures that result from combinations of component failures that 
propagate through the system due to the functional dependencies of 
the system that are represented in the risk model. 


° However, they are typically ineffective at identifying system failures 
that result from unknown or underappreciated (UU) risks, 
frequently involving complex intra- and inter-system interactions that 
may have little to do with the intentionally engineered functional 
relationships of the system. 
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Unknown and Underappreciated Risks vasa 
(Cont'd) 


° Earlier in 2009, the NASA Advisory Council noted the following set of 
contributory factors: 


— Inadequate definitions prior to agency budget decision and to external 
commitments 


— optimistic cost estimates/estimating errors 

— inability to execute initial schedule baseline 

— Inadequate risk assessments 

— higher technical complexity of projects than anticipated 
— changes in scope (design/content) 

— Inadequate assessment of impacts of schedule changes on cost 
— annual funding instability 

— eroding in-housetechnicalexpertise 

— poor tracking of contractor requirements against plans 
— Reserve position adequacy 

— lack of probabilistic estimating 

— “go as you can afford” approach 


— lack of formal document for recording key technical, schedule, and programmatic 
assumptions. 
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Notional PRA Examples Sy 


First the Math 


1.0E-02 = 0.01 => 1:100 (Probable) =» ~Shuttle Mission Risk 

1.0E-06 = 0.000001 =» 1:1,000,000 (Improbable) = having 20 coins 
simulaneously landing 
on tails 

1.0E-12 = 0.000000000001 =» 1:1,000,000,000,000 (ridiculous) 
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Time Perspective 


1.2 x 10'* hours ago 4 x 10'° hours ago 2 x 10'*-— 7 x 10" hours ago 
~14 billion years ago ~4.5 billion years ago ~228-— 80 million years ago 


4 x 108 hours ago 2.1 x 10° hours ago 6.3 x 10° hours ago 
~46,000 years ago ~240 years ago ~72 years ago 
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Uncertainty Distribution 


®@ = This distribution is a representation of the uncertainty associated with a PRA’s results 
@ The median is also referred to as the 50" percentile 


Median - 1/94 


Mean — 1.1E-02 (1:90) Mean - 1/90 
Median — 1.1E-02 (1:94) 


5th percentile — 7.9E-03 (1:127) 


1 


95t* percentile — 1.6E-02 (1:63) 


5th - 1/127 | 


) 


4.0E-03 6.0E-03 8.0E-03 1.0E-02 1.2E-02 146-02 166-02 186-02 20E-02 22E02 24E-02 
Probability 


2 
2 
oO 
a 
2 
2 
o 
2 
2 
a 
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The 5t* and 95" percentile are common points on a distribution to show the range that 90% 
of the estimated risk lies between. 


@ The mean is a common measure of risk that accounts for uncertainty or this distribution, thus 


the value or metric used to verify LOC requirements. 
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System 1 


System 2 


Human Erro 


Conditional 
Failure 
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Showing Uncertainty wrt Requirements vasa 


Notional 


5 1 in 1,600 | 
2,500 Sa | 1,000 
| 1 in 1000 
1i 1 in 500 
in 
r ] | 1 in 150 | | 
1 in 200 | [sin | 
| 1in 18 
1 in 30 . | 3 1in 10 
1/10000 1/1000 1/100 1/10 
Green Bar shows Requirement Value is met 
Red Bar shows Requirement Value is not met 
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Notional Risk Drivers via Pareto asa 
(e.g. Top 80% of Calculated Risk) 


A Pareto chart like this can be made for each project, rig, platform, etc. 


1 in xxx Risk 


Various 
Subsystems and 
Scenarios 


% of Risk 
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In Closing Sy 


¢ There is much more to know about PRA than what you’ve seen 
today. This presentation was to give you insight in order to ask 
the right questions when you are trying to decide: 
o whether you need a PRA or not, 
o Is it being performed properly and by qualified analysts, 
o Is It answering the question(s) you need answered. 


¢ PRA (with the help of deterministic analyses) identifies and ranks 
the risk contributors, the FMEA analysts and Reliability Engineers 
can help solve the problem by focusing on the top risk drivers. 


fee 
Ss) 
e 
A 
aa) 
2 
mS) 
=> 
4) 
c 
<x 
< 
= 
oe) 
ep) 
O 
oP) 
=) 


May 2016 34 


Backup Charts Sy 
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Acronyms and Definitions 


Cut set: Those combinations of items that can cause a failure of the type that you are 
interested in. A “minimum cutset” is the minimum combination of items necessary to 
cause the failure of interest. 


End State: The consequence of interest that is defined for what your model is supposed to 
calculate (sometimes will be referred to as a Top event or Figure of merit depending on 
model type). 


Top event (Top): The top event in a fault tree or a pivotal event in an event tree. If an 
event tree uses a linked fault tree to calculate a pivotal event then the pivotal event name 
and Fault tree “Top” name need to be identical. 


MLD: Master Logic Diagram. Used to identify all possible initiators. 


Event Tree: A logic tool that is used to model inductive logic and quantify models using 
Boolean logic. Can be linked to other event trees and can use fault trees linked to it. 


Fault Tree: A logic tool that is used to build deductive models of equipment or processes 
and is quantified with Boolean Logic. Can be linked to Event Trees for a linked fault tree 
model. Built from top down and quantified from bottom up. 


PRA: Probabilistic Risk Assessment: A technique used for evaluating rare events for 
complex systems or processes. Attempts to account for all possible events that can cause 
the “end state”, “Top event”, “Figure of Merit”. Uses fault trees, event trees and other 
methods to “infer” the probability of events of interest. Better definition later. 


Rare Event: An event that has a small probability of happening. From a data point of view, 
it will have never been seen in practice or seen only rarely. It will not have enough data to 
be statistically significant. From the “rare event approximation point of view it is a 
probability that is 0.1 or less. 
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Acronyms and Definitions 


(continued) 


ym 


LOC: Loss of Crew: A common “end state”, “top event” consequence, or “Figure of 
Merit” that we are interested in at NASA. 


LOM: Loss of Mission; A common “end state”, “top event”, consequence, or “Figure of 
Merit” that we are interested in at NASA. 


Risk: Probability or Frequency, times consequences 


“And” gate: A logic symbol used in Fault Trees that multiplies inputs to it. In Boolean 
algebra it defines the “intersection” of events that are put into it. 


“Or” gate: A logic symbol used in Fault trees that adds inputs to it. More accurately, in 
Boolean Algebra” it is the “union” of events that are put into it 


Bathtub Curve: This is a curve shaped like a bathtub that represents infant mortality or 
break-in failures early in a component or systems life and wear-out or aging late in life 
with a relatively constant or flat line connecting them. The flat line or constant failure 
rate implies that failure rates are random and independent of time. 

Infant mortality: The portion on the bathtub curve that is on the front end showing that 
failure rates are improving (becoming smaller) as time increases. 

Aging: The Portion on the Bathtub curve that is on the back end that shows the failure 
rates increasing as components wear out or age. 

Exponential Distribution: This is the distribution or equation that we use to represent 
the flat part of the bathtub curve (constant failure rate) and our PRA models that rely on 
the failure rates being random with respect to time. For reliability it is et and in failure 
space, it is 1-e* 
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Acronyms and Definitions asa 


(continued) 


18. Time Rate of Failure: Failures that are defined as a rate of failure per time interval (e.g. 
failures per hour) 

19. Demand Failure: Failures that are defined as a failure per demand. 

20. Conditional Probability: This is a probability of occurrence that is pre-conditioned ona 
specific set of circumstances that precedes it or is concurrent with it. 

21. Frequency: This is a rate (usually per time but can defined per other parameters such 
as demands etc.). This is a number greater than 0 but not necessarily less than 1. 

22. Probability: Dimensionless number between 0 and 1. Describes the likelihood of 
something happening. 

23. Minimal Cutset: A “minimum cutset” is the minimum combination of items necessary 
to cause the failure of interest. 

24. ESD: Event Sequence Diagram: This is a tool sometimes used to help explain the flow 
of an event or events and can be directly represented by an event tree. It uses 
inductive logic. Relatively few computer software programs will quantify ESDs. 

25. Lambda: This is a rate of failure. Often uses the Greek symbol |. Most of the time this 
will be a time rate of failure but can also be used to represent a “demand rate of 
failure”. 

26. AX: Greek letter Lambda often used to show a failure rate. 
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Acronyms and Definitions Sy 


(continued) 


27. Lognormal Distribution: This is a distribution of events that if graphed on log paper it would 
show a normal distribution. It is a distribution often used in the PRA world to define the 
uncertainty of Lambda (|). 


28. EF (Error Factor): This is a parameter used to help define the width of a lognormal 
distribution. It is defined as the 95th/50th = 50th/5th = Square root of 95th/5th. We will 
often times approximate a result of an uncertainty evaluation with a Lognormal distribution 
when it is in fact not a lognormal or any other kind of distribution but a lognormal does a good 
job of approximating it. In such cases we always try and use the definition of EF= Square root 
of 95th/Sth. 


29. Fussel Vessely (FV): Fussel Vesely importance measure. Represents how much of a 
components failure is contributing to the Top event or end state. Often expressed as a 
percentage it is not really and will be covered later. 


30. Risk Increase Ratio (RIR): This is another importance measure that will tell you how much a 
Top Event or End State will increase if you set an items probability of failure to 1 and 
recalculate the end state or top event. It is equivalent to RAW. 


31. Risk Achievement Ration (RAW): This is another importance measure that will tell you how 


much a Top Event or End State will increase if you set an items probability of failure to 1 and 
recalculate the end state or top event. It is equivalent to RIR. 
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Acronyms and Definitions neal 


(continued) 


32. Risk Reduction Ratio (RRR): This is another importance measure that will tell 
you how much a Top Event or End State will decrease if you set an items 
probability of failure to 0 and recalculate the end state or top event. It is 
equivalent to RRW. 


33. Risk Reduction Worth (RRW): This is another importance measure that will tell 
you how much a Top Event or End State will decrease if you set an items 
probability of failure to 0 and recalculate the end state or top event. It is 
equivalent to RRR. 


34. common Cause Failure (CCF): This is a failure cause that can result in multiple 
failures of identical redundant equipment within a short time span therefore 
reducing the advantage of having redundant equipment. (e.g. contaminated 
lube oil fails multiple pumps in a redundant system). 


35. Big Stew (BS) extra credit: This is a method defined by the incredibly brilliant 
Mark Bigler and Mike Stewart in order to model inter-phase dependencies 
using a linked fault tree model. The only reason Bigler is allowed to have top 
billing is so we can get a good and memorable Acronym (BS). It is also okay to 
consider the Big in “Big Stew” to be a modifier of Stew. 
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Common Cause 
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Common Cause S& 


Definition Of Common Cause Failure (CCF) 

some basics 

Types Of CCF Models 

Examples of common cause 

Deriving common cause parameter values from data 


Examples of Beta’s calculated from real data (NASA 
and Nuclear) 


Conclusions 
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Common Cause Modeling vasa 


° All large PRAs of complex and redundant machines must include 
“common cause” effects to be complete and accurate 


* Common Cause are those conditions that defeat the benefits of 
redundancy 
— Not “single point failures” 
— Similar to “generic cause” 


° There are three recognized ways to perform common cause modeling: 
— The Beta Model 
— The Multiple Greek Letter Model 
— The Alpha Model 


° We use an iterative approach to modeling common cause first the 
Beta Model approach is used and if it shows up as a risk driver a 
Multiple Greek Letter Model is used 
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° Generic data from NUREG/CR-5485 for the majority of the events since 
there are few cases where there is enough Shuttle data to develop 
Shuttle specific values 

— RCS Thrusters and ECO sensors are examples of cases where Shuttle specific 
data is used to calculate the common cause parameters 
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Common Cause Modeling (2) Sy 


HOW THE BETA MODEL APPROACH WORKS 


° Susceptibility groups (groupings of similar or identical equipment) of 
redundant trains or components are identified 


° Acommon cause basic event is defined for these groups 


° The common cause basic event failure rate is generated by taking the 
independent failure rate times a “Beta” factor. 
— For the beta model it does not matter how many components are in the group 


— The “Beta” factor represents the probability of 2 or more failures given a failure has 
occurred 
> For this reason, the Beta Model may be conservative for component groups larger than 2. 
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° The “Beta” factor is taken from NUREG/CR-5485 and has a different 
value for “Operating” failures vs. “Demand” failures 
— Operating failures the “Beta” value is 0.0235 
— Demand failures the “Beta” value is 0.047 
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® Common Cause Modeling (3) vasa 


HOW THE MULTIPLE GREEK MODEL APPROACH WORKS 


¢ Similar to the Beta Model except that the Multiple Greek Model takes credit 
for the full redundancy and therefore can be much more complicated 


— Fora3component group, there is a “beta” factor and a “gamma?” factor where 
the “beta factor is still the probability of 2 or more failures and the “gamma?” factor 
is the probability of 3 or more failures given 2 or more failures. 
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Common Cause Definition S& 


9 


“In PRA, Common Cause Failures (CCFs) are failures of two or 
more components, subsystems, or structures due to a single 
specific event which bypassed or invalidated redundancy or 
independence at the same time, or in a relatively short interval 
like within a single mission 


- May be the result of a design error, installation error, or maintenance 
error, or due to some adverse common environment 
- Sometimes called a generic failure. 


® 


* Common Cause, as used in PRA, is not a single failure that takes 
out multiple components such as a common power supply to 
computers or common fluid header to multiple pumps. 
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- Single point failures such as these are modeled explicitly ina PRA 
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Some Basics on PRA and 
Common Cause Failures 


© PRA 


— PRA is used to perform “rare event” analysis 


If we had 1000 Space Stations operating for 50 years each and we had lost 60 of them we 
would not need to do a PRA to determine what the loss of station failure rate was 


However, we have only had one Station operating for ~ 10 years with no loss of station so 
methods like PRA are needed to estimate this value 


— Most of the components used in space vehicles are designed to be low failure rates 
and limited numbers of these components mean that an actual failure rate number is 
difficult to calculate from operational data (uncertainty is high!) 


© Common Cause Parameters 


— Beta is modeled as a fraction of the total failure rate. 
® Total failure rate = Independent failure rate + common cause failure rate 
Beta = common cause failure rate / Total failure rate 
® This is ~ to common cause failure rate / independent failure rate (when Beta is small) 


— If you have a low failure rate for a component, the common cause failure rate will 
be low too but could still have a high Bota factor 


— A failure rate is a rate such as Failures per hour and a Failure probability is derived by 
the equation of 1-e*' where 4 is the failure rate. When It is a small value the equation 
can be simplified using the rare event approximation and we get Failure probability ~At. 


Note: Beta is a parameter of a single modeling method, and there are several 


modeling methods and variations most work in similar fashion 
47 


Types Of Common Cause Models 


Common Cause is modeled as a conditional probability, i.e. 
Given that a component has failed, what is the probability 
that another like component will fail 


Common models used are: 


- Beta (b) model — For a system with multiple like 
components, Beta factor is used to estimate the probability 
of failure of all components (i.e. two or more) 


- Values for Beta can range from 1 to 0.0001 (or less), 
but more typical values are usually between 0.1 and 
0.001 


- Multiple Greek Letter (MGL) model — For systems with 3 or 
more like components, provides for a more explicit 
breakdown of possibilities, probabilities of two, three, four, 
etc. component failures 


- Alpha (a) model — Similar to the MGL model 
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xample Of Impact Of Modeling Common Cause vasa 


A system consisting of two trains: 


Without Considering Considering Common 
Common Cause Cause 


Beta (B) 
Wie 


COMMON CAUSH | 

FAILURE OF TWC fe ene 
PATHS 
C) 4.7E-5 ( 


EVENT-4-0 


C) 1.0E-3 C) 1.0E-3 VALVE A FAILS VALVE B FAILS 
VALVE_A_FAILS VALVE_B_FAILS 
Ss 1.0E-3 & 1.0E-3 


VALVE_A_FAILS VALVE_B_FAILS 


Results in a ~ 4.7E-05 Underestimate of Risk Which is 48 


Times the Risk Without Considering Common Cause i 


Qe ore Of Impact Of Modeling Common Cause S& 


A system consisting of three trains: 


Without Considering Considering Common 
Common Cause Cause (Beta Model) 


FF FAILURE OF 
THREE > 
PATHS aes | 


COMMON 
AUSE 


VALVE_A_FAILS VALVE_B_FAILS VALVE_C_FAILS 


Results in a ~ 4.7E-05 Underestimate of Risk Which is 47,000 
Times the Risk Without Considering Common Cause 


Note: Using a MGL Model Would Reduce Result to 2.6E-05 50 


When Should YouDoaPRA? 


° As early in the design process as you can in order to 
affect the design and corresponding risk with 
minimal cost impact (i.e. to support Risk Informed 
Design (RID)) 


° When the risk of losing the project is greater than 
the company can live with either due to loss of life or 
for environmental or economic reasons 


°* To support Risk-Informed Decision Making (RIDM) 
throughout a project’s life cycle from “formulation to 
implementation” or “concept to decommissioning” 
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How much does a PRA cost? vasa 


9 


°* As you can also ask, “How much will it cost to not 
do a PRA?” 


°* The cost of a PRA is a function of the level of detail 
desired as well as the size/complexity of the item 
being assessed and the mission life cycle 


— You should only model to the level of detail that you have data 
and no further. You may identify that significant risk exists ata 
sublevel, then your PRA is telling you that you need to study that 
level further. It may not be a PRA, but a reliability assessment at 
that time. 


— Modeling a drilling rig is on a different scale than just the Blowout 
Preventer (BOP). However, understanding the need for a BOP 
can be important in its design and operation. 
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9 


Absolute vs Relative Risk? asa 


* You may have heard, “Don’t believe the absolute risk estimate, 
just the relative ranking”. 


°* Each event in a PRA is assessed to having a probability of 
failure (Since the PRA is performed in “failure space”). 


— these failures are combined via the failure logic which is used to 
determine how they are combined and the resulting scenarios. 


— the failure probabilities of each event are used to establish the 
probability of each scenario thus ranks the scenarios as well as being 
added to produce the overall risk. 


— If different approaches and methods are used (which sometimes are 
needed in full scope PRAs), then the absolutes can be challenged and 
so may their rankings. This is where experienced PRA analysts earn 
their pay to help minimize the difference. 
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° As aresult, some decision makers or risk takers want to know 
the overall risk, while others want to know how to reduce it by 


working on the top risk drivers first. 
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